{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Getting Started with Projections"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we are showing some basic functionalities for analysing MD trajectories using `Projection` classes. First, let's import HTMD:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Please cite HTMD: Doerr et al.(2016)JCTC,12,1845. \n",
      "https://dx.doi.org/10.1021/acs.jctc.6b00049\n",
      "Documentation: http://software.acellera.com/\n",
      "To update: conda update htmd -c acellera -c psi4\n",
      "\n",
      "You are on the latest HTMD version (unpackaged : /home/joao/maindisk/software/repos/Acellera/htmd/htmd).\n",
      "\n",
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "from htmd.ui import *\n",
    "config(viewer='webgl')\n",
    "#for inline plotting in Jupyter notebooks\n",
    "%pylab inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Projecting simulations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Get the data for this tutorial [here](http://pub.htmd.org/tutorials/projections/data.tar.gz). Alternatively, you can download the data using `wget`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "assert os.system('wget -rcN -np -nH -q --cut-dirs=2 -R index.html* http://pub.htmd.org/tutorials/projections/data/') == 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "We will be doing two types of data projections:\n",
    "* Projecting single Molecule objects\n",
    "* Projecting lists of simulations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Projecting single Molecule objects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can load a previously simulated single MD trajectory (in this case, NTL9 protein) into a `Molecule` object and visualize it using:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2018-03-15 23:57:31,506 - htmd.molecule.molecule - INFO - Removed 17 atoms. 631 atoms remaining in the molecule.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7fa612a772bd414eae15f79ed23e8421",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "mol = Molecule('data/ntl9_structure.psf')\n",
    "mol.read('data/ntl9_trajectory.xtc')\n",
    "mol.filter('protein')\n",
    "mol.view()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Getting the RMSD along time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "HTMD has many projection classes. One of those classes is `MetricRmsd`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the `MetricRmsd` class, one may calculate the RMSD of a `Molecule` object against a reference structure (`refmol`, e.g. the crystal structure) for a given atom selection (`trajrmsdstr`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 16.36052132  15.79398251  14.12205887  14.77201653  13.90877914\n",
      "  12.92834282  11.54159355  10.69192505  10.77845764  12.86611652]\n"
     ]
    }
   ],
   "source": [
    "crystal = Molecule('data/ntl9_crystal.pdb')\n",
    "metr = MetricRmsd(refmol=crystal, trajrmsdstr='protein and name CA')\n",
    "proj = metr.project(mol)\n",
    "print(proj)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The values are in Angstroms. They are quite high values, because the sample trajectory is of the unfolded protein (see the visualization above)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Plot the RMSD along time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The previously obtained RMSD values can be easily plotted using matplotlib:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvFvnyVgAAIABJREFUeJzt3Xd4VHX6/vH3k95IAiQQeui9R5oKdimCLlaaogiirrvW365bbOt+1bVXkCbqIpa1iwVclSI1KL0ngPSElhBKQpLn90eGNWLKEDNzpjyv65qLyeTMOTfnSubOmTmfzxFVxRhjTPAKcTqAMcYYZ1kRGGNMkLMiMMaYIGdFYIwxQc6KwBhjgpwVgTHGBDkrAmOMCXJWBMYYE+SsCIwxJsiFOR3AHUlJSZqamup0DGOM8SvLly/fr6rJlS3nF0WQmppKenq60zGMMcaviMh2d5azt4aMMSbIWREYY0yQsyIwxpggZ0VgjDFBzorAGGOCnBWBMcYEOSsCY4wJcgFdBAu37Gfagq2cLCp2OooxxvisgC6CL9bs5ZHP1jHw+fks2Lzf6TjGGOOTAroIHrm8PZNGdSe/sJiRU5dwy5vp7Dh4zOlYxhjjUwK6CESES9qnMPuuvtx3aWvmbdrPhc/M5enZGzlWUOh0PGOM8QkBXQSnRIWHcvv5Lfjm3n4M6JDCi99s4cKn5/LJyt2oqtPxjDHGUUFRBKfUS4jm+eu68t743tSKjeAPM3/k2lcXs3Z3jtPRjDHGMUFVBKeclVqLT35/Do8N7ciW7DwGv7iAv364moNHC5yOZowxXheURQAQGiIM69GYb+85jxv6pPL2sh2c/9R3vL5wG4V2uqkxJogEbRGckhATzoOD2/PFH8+lQ4N4HvxkLYNeWMDCLXa6qTEmOAR9EZzSqm4N/j2mJxNHdudoQSHDpyzhthnL2XnITjc1xgQ2jxWBiEwTkSwRWXPa43eIyEYRWSsi//LU9qtCROjfIYWv7+7HPRe34tsN2Vz49FyenbOJ4wVFTsczxhiP8OQRwXSgf+kHROR84HKgk6q2B57y4ParLCo8lDsubMl/7+nHJe1TeP6/m7nombnMWrXHTjc1xgQcjxWBqs4DDp728K3A46qa71omy1Pbrw71E6N5cVhX3hnXi/jocG5/6weum7SY9XtynY5mjDHVxtufEbQCzhWRJSIyV0TO8vL2q6Rns9p8dsc5PHpFBzbtO8KgF+bz94/WcMhONzXGBABvF0EYUBPoBdwHvCsiUtaCIjJORNJFJD07O9ubGcsUGiKM7NWEb+89j1G9mjBjyXbOf/o73ly8naJie7vIGOO/vF0EO4EPtMRSoBhIKmtBVZ2kqmmqmpacnOzVkBVJjIng4cs78Pkfz6VtSjx//2gNg16Yz+LMA05HM8aYKvF2EXwEXAAgIq2ACMAvT9hvkxLPW2N7MmFEN46cKOS6SYu5/a0f2HX4uNPRjDHmjHjy9NGZwCKgtYjsFJExwDSgmeuU0reBG9SPT8MREQZ0rMfXd/fjzota8vW6fVz49Hc8//VmTpy0002NMf5B/OF1OC0tTdPT052OUamdh47x2OcbmLV6Dw0So/nboLb075BCOR+DGGOMR4nIclVNq2w5G1lcjRrWjOHlEd2YObYXNaLCuHXGD4yYsoSNe484Hc0YY8plReABvZuXnG76j8vbs3Z3LgNfmM+rczOcjmWMMWWyIvCQsNAQRvVO5bt7z+OCNnX411cb2bzPjgyMMb7HisDDasZG8PjQjsRGhPLIZ+tsigpjjM+xIvCC2nGR3HlRK+Zv3s/X6316Vg1jTBCyIvCSUb2b0KJOHI/OWkd+oZ1aaozxHVYEXhIeGsIDl7Vj+4FjTFuwzek4xhjzP1YEXtS3VTIXta3LS99sJiv3hNNxjDEGsCLwur8NasvJIuWJLzc6HcUYYwArAq9LTYrlpnOa8v4PO1mx47DTcYwxxorACb+/oAXJNSJ56JO1FNsU1sYYh1kROCAuMow/9W/Dih2H+WjFLqfjGGOCnBWBQ4Z2bUDnRok8/sUGjuYXOh3HGBPErAgcEhIiPDi4HVlH8nn52y1OxzHGBDErAgd1a1yToV0bMGX+Vn46cMzpOMaYIGVF4LA/DWhDWKjw6Kx1TkcxxgQpKwKH1Y2P4vbzWzB73T4WbPbLq3YaY/ycFYEPGHNOUxrXiuGRz9ZSWFTsdBxjTJCxIvABUeGh/HVQWzbty2PGkp+cjmOMCTJWBD7iknZ1OadFEs/M2cShowVOxzHGBBErAh8hIjwwuB15+YU8M2eT03GMMUHEisCHtKpbg1G9mjBjyXbW78l1Oo4xJkhYEfiYOy9qSUJ0OI98ape1NMZ4hxWBj0mMieDuS1qzKPMAX67Z63QcY0wQsCLwQcPOakSblBr88/P1nDhpl7U0xniWFYEPCgsN4YHB7dh56DiT52U6HccYE+A8VgQiMk1EskRkTanHHhKRXSKywnUb6Knt+7s+zZMY0CGFV77LYE/OcafjGGMCmCePCKYD/ct4/FlV7eK6fe7B7fu9vwxsS5Eqj3+xwekoxpgA5rEiUNV5wEFPrT8YNKoVwy19m/Hxit2kb7NdaYzxDCc+I/i9iKxyvXVU04Ht+5Vbz2tOSnwUD3+6zi5raYzxCG8XwQSgOdAF2AM8Xd6CIjJORNJFJD07O9tb+XxOTEQY9w9sw+pdOfxn+U6n4xhjApBXi0BV96lqkaoWA5OBHhUsO0lV01Q1LTk52XshfdCQzvXp3qQm//pqA7knTjodxxgTYLxaBCJSr9SXvwPWlLes+ZmI8NDg9hw4WsBL39hlLY0x1cuTp4/OBBYBrUVkp4iMAf4lIqtFZBVwPnCXp7YfaDo2TODq7g157futZGbnOR3HGBNAPHnW0DBVraeq4araUFWnquooVe2oqp1UdYiq7vHU9gPRfZe2ITIslEdnrXc6ijEmgNjIYj+SXCOSP1zYgm82ZPHtxiyn4xhjAoQVgZ8Z3acpTZNi+cdn6ygo9M/LWh4vsPmTjPElVgR+JiIshL9f1pbM7KO8sWib03HOSH5hEU99tZEOD33F3z9aQ5GNizDGJ1gR+KEL2tTlvNbJPP/1Zvbn5Tsdxy1rduUw5MXveenbLXRumMCbi7dz67+X2+yqxvgAKwI/9bdB7Th+suQvbF9WUFjMM3M2cfnL33PoWAHTRqfxwW1n8/CQ9sxZv48RU5bYNZqNcZgVgZ9qUSeO0X1SeSd9B2t25Tgdp0xrd+dw+cvf88J/N3N5l/rMuasfF7SpC8ANfVJ5ZXg3Vu/K4cqJC9lx8JjDaY0JXlYEfuyOC1tSKyaChz9d61OXtTxZVMxzX2/i8pe+Z39ePlOuT+OZa7qQEBP+i+UGdKzHjJt7sv9IPkMnLGTtbt8sNGMCnRWBH0uIDue+S1uzbNshPl3lG0My1u/J5YqXv+e5rzdzWad6zLmrLxe1q1vu8mel1uL9W/sQHiJc++pi5m8O3nmljHGKlPeXpIgMreiJqvqBRxKVIS0tTdPT0721Ob9SVKwMeWkBB48W8M095xEdEepIjpNFxUz8LoMXvtlMQnQ4//xdRy5tn+L28/fmnGD0a0vZkpXHk1d34nddG3owrTHBQUSWq2paZcuFVfC9wRV8TwGvFYEpX2iI8NCQ9lw9cRET5mZw98WtvJ5h494j3PveSlbvymFw5/o8PKQ9tWIjzmgdKQlRvDu+N7e8sZy73lnJ3px8xvdrhoh4KLUx5pRyi0BVb/RmEFN1Z6XWYnDn+rw6N4Nr0hrSsGaMV7ZbWFTMq/Myef7rzdSICmPCiG4M6Fiv8ieWIz4qnOk3ncV9763iiS83sDfnOA8Mbk9oiJWBMZ5U0RHB/4jIIKA9EHXqMVV9xFOhzJm7f0Ab5qzby2Ofb+DlEd08vr3N+0qOAlbuzGFQp3o8MqQ9teMif/N6I8NCee7aLqQkRDFpXib7cvN57rouRIU785aXMcGg0g+LRWQicC1wByDA1UATD+cyZ6h+YjS39mvBrNV7WJx5wGPbKSwqZsJ3GQx6YQE7Dh3n5eHdeHl4t2opgVNCQoS/DGzLA5e146t1exk1dQmHj9lYA2M8xZ2zhvqo6vXAIVV9GOgNNPJsLFMVt/RrRoPEaB7+dJ1Hpm/YkpXHVRMX8cSXG7iwbR1m39WXQZ2q/lZQZW46pykvDuvKyh05XDVxEbsOH/fYtowJZu4UwanfvmMiUh84CTT1XCRTVVHhofxlYFvW78nl7WU/Vdt6i4qVSfMyGPjCfLYdOMoLw7ryyohuJFXjUUB5LutUnzfG9GBf7gmGvvI963bnenybxgQbd4rgMxFJBJ4EfgC2AW97MpSpuoEdU+jZtBZPfbWRnGO//bKWmdl5XD1xIf/3+Qb6tUpm9l19GdK5vlfP5unVrDb/Gd+HEBGueXUR32/Z77VtGxMM3CmCf6nqYVV9n5LPBtoAj3o2lqkqEeHBwe3JOX6S5/67qcrrKSpWpszPZMDz88nIPspz13Zh0qju1KkRVfmTPaB1Sg0+uK0PDRKjGf3aUj5escuRHMYEIneKYNGpO6qar6o5pR8zvqdd/XiG9WjMG4u2s3nfkTN+/tb9R7n21UU8Oms957ZMYs5dfbmiawPHz+mvlxDNu+N7061xTf749gomzcvwqak1jPFX5RaBiKSISHcgWkS6ikg31+08wDsnqpsqu+eS1sRGhPLIZ+vcfrEsLlamLdjKgOfnsWnfEZ6+ujOTr0+jTrwzRwFlSYgO540xPRjUqR7/9/kGHvlsHcV2XQNjfpOKxhFcCowGGgJPU3LqKMAR4C+ejWV+q1qxEdx1cSse/nQdX6/P4uIK5vsB2H7gKPf9ZxVLtx7k/NbJPDa0EykJvlMApUWGhfLidV1JiY9i6oKtZOXm8/Q1nW2sgTFVVO5cQ/9bQORK1+cDjrG5hqrmZFExA5+fT0FRMbPv6ktk2K9fKIuLlTcXb+fxLzYQFiI8MLgdV3Vv6PjbQO6aMj+TR2etp0fTWkwelfarGU6NCWbuzjXkzmcEDUUkXkpMEZEfROSSashoPCw8NIQHBrdj+4FjTFuw7Vff33HwGMOnLObBT9bSo2ktZt/dl6vTGvlNCQDcfG4zXhjWlRU/HeaqiQvZbWMNjDlj7hTBTaqaC1wC1AFuBB73aCpTbc5tmcxFbevy0jebyco9Afx8FHDpc/NYsyuXJ67syPQbz6JeQrTDaatmSOf6TL/pLPbmnGDoKwvZsNfGGhhzJtwpglN/Hg4EXlPVlaUeM37gb4PacrJIeeLLjew4eIyRU5fw94/W0L1JTb66qy/XntXYr44CytKneRLv3dobRbl6wiIWZXhumg1jAo07RbBcRGZTUgRfiUgNoNizsUx1Sk2K5aZzmvL+Dzu59Ll5rNxxmP/7XUfeuKkHDRL98yigLG1S4vngtrNJSYjihmlL+XTlbqcjGeMX3CmCMcCfgbNU9RgQQcnbQ8aP/P6CFqTWjqFb45KjgOE9/f8ooCwNEqP5z/g+dGmUyB0zf2TK/EynIxnj89wpgneBekAugKoeUNVVlT1JRKaJSJaIrCnje/eKiIpI0hknNlUSFxnGt/eex79v7um16xU4JSGmZKzBwI4pPDprPf+wsQbGVMidIpgIDAc2i8jjItLGzXVPB/qf/qCINAIuBqpvVjTjlkA8AihPVHgoLw7rxug+qUxdsJU/vP0j+YVFTscyxidVWgSq+rWqjgC6UTLh3BwRWSgiN4pIuSdtq+o84GAZ33oW+H+UXO7SGI8JDREeHNyO+we04bNVe7hh2lJyjv/2ifiMCTTuHBEgIrUpGWV8M/Aj8DwlxTDnTDYmIkOAXa4zj4zxOBHhln7Nef66LizffohrJi5iT46NNTCmNHeuUPYBMJ+S+YUGq+oQVX1HVe8A4tzdkIjEAH8FHnBz+XEiki4i6dnZ2e5uxpgyXd6lAdNv7MGuw8cZ+spCNlVhMj5jAlWFRSAiIcAKVW2nqo+p6p7S33dn6HIpzSm5oM1KEdlGyRxGP4hISlkLq+okVU1T1bTk5OQz2IwxZTu7RRLv3NKLomLlqgkLWeLBS3oa408qLAJVLQYGVMeGVHW1qtZR1VRVTQV2At1UdW91rN8Yd7Svn8AHt/UhuUYko6Yu5dsNWU5HMsZx7nxGMFtErpQzPOVERGZSct2C1iKyU0TGVCmhMdWsYc0Y3r+1D61S4rjl38uZu8neejTBzZ3ZR48AsUAhcIKS6SVUVeM9H6+EzT5qPOHwsQKGT17Cluw8plyfRt9W9hakCSzVNvuoqtZQ1RBVjVDVeNfXXisBYzwlMSaCGTf3pHlyHGPfSGfBZrsWsglO7pw19DsRSSj1daKIXOHZWMZ4R83YkjJomhTLzW8sY+EWKwMTfNz5jOBB13WKAVDVw8CDnotkjHfVcpVB41ox3PT6Mhbb2UQmyLhTBGUtU9ElLo3xO7XjInlrbC8a1YzhxteW2amlJqi4UwTpIvKMiDQXkWYi8iyw3NPBjPG2JFcZ1E+M4sbpy1i2rawZUowJPO4UwR1AAfAOJTORHgdu92QoY5ySXCOSmWN7kRIfxehpS1m+3crABD53zho6qqp/PjXKV1X/oqpHvRHOGCfUiY9i5rhe1ImP4oZpy/jhp0NORzLGo9yadM6YYFM3PoqZY3tROy6CG6YuZcWOw05HMsZjrAiMKUdKQkkZ1IyNYNTUJazaaWVgApMVgTEVqJ8YzcxxvUiMCWfklCWs2ZVT+ZOM8TPuDChrKCIfiki2iOwTkfdFpKE3whnjCxokRjNzbC9qRIUzwsrABCB3jgheAz6h5LrFDYBPXY8ZEzQa1ozh7XG9iIsMY+TUJazbnet0JGOqjTtFkKyqr6lqoes2HbDZuUzQaVQrhpljexEdHsqIKYtZv8fKwAQGd4pgv4iMFJFQ120kYMMuTVBqXLvkyCAyLJQRU5awca9d6cz4P3eK4CbgGmAvsAe4CrjRk6GM8WVNascyc1wvwkOF4ZMXs9kue2n8nDtF0Mh1neJk1xXGrgAaeTqYMb6saVIsb43tRUiIMGzyErZk5TkdyZgqc6cIXnTzMWOCSvPkOGaO7QXAsMmLyci2MjD+qdwiEJHeInIPkCwid5e6PQSEei2hMT6sRZ04Zo7tiaoybNJiMq0MjB+q6IggAoijZMrpGqVuuZR8TmCMAVrWrcFbY3tRVKwMm7yYbfttKi7jX9y5ZnETVd3upTxlsmsWG3+wYW8uwycvITIshLfH9aJJ7VinI5kgV53XLHa0BIzxF21S4vn3mJ4cP1nEsEmL2XHwmNORjHGLzTVkTDVqVz+eGTf35GhBEddZGRg/YUVgTDVrXz+BGTf35MiJkwybvJidh6wMjG+rsAhE5HwR+UBE1rpu/xGR87yUzRi/1aFBAjNu7kXO8ZIy2H34uNORjClXRaePDgKmUTLJ3HBgBPA5ME1EBnonnjH+q2PDBP49pieHj5aUwZ4cKwPjmyo6IrgPuMI14dxKVV2hqtOAK4A/eSeeMf6tc6NE3hjTgwN5BQyfvIS9OSecjmTMr1RUBCmquvL0B1V1FVC3shWLyDQRyRKRNaUe+4eIrBKRFSIyW0TqVy22Mf6ja+OavH5TD7JyTzB88mKycq0MjG+pqAgqGhXjzoiZ6UD/0x57UlU7qWoX4DPgATfWY4zf696kpAz25p7gusmLyTpiZWB8R0VF0FxEPinj9inQrLIVq+o84OBpj5WewD0WqHg0mzEBJC21FtNv7MHenBMMn7yE7CP5TkcyBqhgZLGI9Kvoiao6t9KVi6QCn6lqh1KP/RO4HsgBzlfV7HKeOw4YB9C4cePu27fbuDYTGBZnHuDG15bRqFY0b43tRVJcpNORTIByd2RxpVNMlFphONAB2KWqWW4+J5XTiqDU9+4HolT1wcrWY1NMmECzMGM/N01fRmrtWGbc3JPaVgbGA37zFBMiMlFE2rvuJwArgTeAH0VkWDVkfAu4shrWY4zf6dM8iak3nMXW/UcZOXUpxwuKnI5kglhFnxGcq6prXfdvBDapakegO/D/qrIxEWlZ6sshwIaqrMeYQHB2iyQmjuzO+j25PP7FeqfjmCBWUREUlLp/MfARgKrudWfFIjITWAS0FpGdIjIGeFxE1ojIKuAS4I9Vi21MYDi/TR1uPDuV1xdtZ+6mMj8uM0Hq8LECRr+2lDW7cjy+rYqK4LCIXCYiXYGzgS8BRCQMiK5sxao6TFXrqWq4qjZU1amqeqWqdnCdQjpYVXdVz3/DGP/1p/5taFknjvveW8mhowWVP8EEhTcXbee7jdmEhYrHt1VREdwC/B54Dbiz1JHAhcAsTwczJlhEhYfy3HVdOHSsgL98uBp3T+AwgevEySKmL9zG+a2TaZMS7/HtlVsEqrpJVfurahdVnV7q8a9U9R6PJzMmiLSvn8DdF7fmizV7ef8HO1AOdu+l7+DA0QLG92vule2FlfcNEXmhoieq6h+qP44xwWtc32Z8uyGLhz5ZS8+mtWhUK8bpSMYBhUXFTJqfSdfGifRoWssr26zoraHxwDnAbiAdWH7azRhTjUJDhKev6QzA3e+uoKjY3iIKRp+v2cuOg8cZ3685Ip7/fAAqLoJ6wCTgUmAUEA58oqqvq+rr3ghnTLBpVCuGh4e0Z9m2Q7w6L8PpOMbLVJWJ32XQPDmWi9tWOrdntanoM4IDqjpRVc8HRgOJwFoRGeWtcMYEo6HdGjCwYwrPztnklVMHje+Yv3k/6/bkckvf5oSEeOdoANy4VKWIdAPuBEYCX2BvCxnjUSLCP6/oSM2YCO58ZwUnTtqo42AxcW4GdeMjubyrd2for2iKiYdFZDlwNzAXSFPVMaq6zmvpjAlSNWMjeOrqzmzJyuPxL2wAfjBYtfMwCzMOcNPZTYkMC/Xqtis6Ivg7kAB0Bh4DfnBdVGa1a2SwMcaD+rZKZnSfVKYv3Mb8zTbqONBNnJtBjagwhvds7PVtl3v6KNDUaymMMWX684A2LNiyn3vfW8lXd/YlMSbC6UjGA7btP8oXa/Yyvl9zakSFe337FX1YvL2sG7CTktNKjTEeFhUeynPXduFAXgF//XCNjToOUJPmZxIeGsKNZ6c6sv2KPiOIF5H7ReQlEblEStwBZALXeC+iMcGtQ4ME7rq4FbNW7+HDH23UcaDJOnKC/yzfyZXdGlKnRpQjGSr6jOBNoDWwGrgZmA1cBVyuqpd7IZsxxmV8v+aclVqTBz5ey46Dx5yOY6rR9O+3cbKomHF9K70CsMdUVATNVHW0qr4KDAPSgMtUdYV3ohljTgkNEZ65pgsA97y70kYdB4gjJ07y5uLtDOiQQtOkWMdyVFQEJ0/dUdUiYKuqHvF8JGNMWRrViuHBwe1Yuu0gk+ZlOh3HVIOZS3/iyIlCr00uV56KiqCziOS6bkeATqfui0iutwIaY352VfeG9G+fwjNzNtqoYz+XX1jE1AVb6dO8Np0aJjqapaKzhkJVNd51q6GqYaXue36CbGPMr4gI/ze0I4kxEdxlo4792sc/7mZfbr7jRwPgxhQTxhjfUis2giev6sTmrDye+NJGHfuj4mJl4rwM2tWL59yWSU7HsSIwxh+d17oO1/duwmvf26hjfzRn/T4ys49yS79mXptquiJWBMb4qfsHtKV5ciz3vreSw8fsWsf+QlWZODeDRrWiGdSxntNxACsCY/xWdEQoz13btWTU8Uc26thfLN16kB9/OszYc5sRFuobL8G+kcIYUyUdGyZw50UtmbVqDx+tsFHH/mDi3AxqxUZwdfdGTkf5HysCY/zc+H7N6d6kJg98tJadh2zUsS/bsDeXbzdmM7pPKtER3p1quiJWBMb4ubDQEJ69pgvFqjbq2MdNmptJTEQo1/du4nSUX7AiMCYANK4dw4ND2rNk60GmzLdRx75o1+HjfLJyN9ed1djnphO3IjAmQFzdvSGXtq/LU7M3sm63Df73NacK+uZzfe9SLx4rAhGZJiJZIrKm1GNPisgG15XOPhQRZ8dVGxNARITHhnYiMSaCO9/50UYd+5BDRwt4e+kOhnSpT/3EaKfj/IonjwimA/1Pe2wO0EFVOwGbgPs9uH1jgk6t2Aj+dVUnNu3L48mvNjodx7i8sWg7x08W+cR0EmXxWBGo6jzg4GmPzVbVQteXi4GGntq+McHq/NZ1GNWrCVMXbOX7LfudjhP0jhcU8fqibVzYpg6t6tZwOk6ZnPyM4Cbgi/K+KSLjRCRdRNKzs20IvTFn4i8D29IsOZZ73l1JzrGTlT/BeMy76Ts4eLSA8ef55tEAOFQEIvJXoBCYUd4yqjpJVdNUNS05Odl74YwJACWjjruwPy+fv328pvInGI8oLCpm8vxMujVOJK1JTafjlMvrRSAiNwCXASPUxsQb4zGdGibyxwtb8unK3Xxso44dMWv1HnYeOs74fs19YnK58ni1CESkP/AnYIiq2hBIYzzs1vOa061xIn/7aA27Dh93Ok5QKZlcLpMWdeK4qG1dp+NUyJOnj84EFgGtRWSniIwBXgJqAHNEZIWITPTU9o0xrlHH13ahuFi5992VFNuoY6+Zuymb9XtyGde3GSEhvns0ABDmqRWr6rAyHp7qqe0ZY8rWpHYsDwxux5/eX83UBVsZ27eZ05GCwsS5GaTER3FFlwZOR6mUjSw2Jghck9aIS9rV5cmvNrJ+j4069rQVOw6zOPMgY85pSkSY77/M+n5CY8xvVjLquCPx0eF2rWMveHVuBvFRYQzr2djpKG6xIjAmSNSOi+RfV3Vkw94jPD3bRh17SmZ2Hl+u3cuo3k2Ii/TYu+/VyorAmCByQZu6jOjZmCkLtrIww0Yde8Lk+ZmEh4Ywuo/vTS5XHisCY4LMXwe1JbV2LPe+u5Kc4zbquDpl5Z7g/eW7uLp7Q5JrRDodx21WBMYEmZiIMJ69tgv7juTzgI06rlbTvt9GYXEx4/zszCwrAmOCUJdGifzhgpZ8vMJGHVeX3BMnmbF4OwM61qNJ7Vin45wRKwJjgtTt5zenq2vU8W4bdfybvbXkJ47kFzK+r+9OLlceKwJjgtSpax0XFZdc69hGHVddfmER0xZs5ewWtenYMMHpOGfMisCYIJaaFMswzRZ5AAAMXklEQVTfL2vHoswDTLZrHVfZhz/sIutIvs9eeKYy/nGSqzHGY647qxHfbsjisS82UKwwvl8zn54p09cUFSuT5mXSvn4857RIcjpOldgRgTFBTkR4cXhXBneuzxNfbuBvH62hsKjY6Vh+Y866vWTuP+rzU01XxI4IjDFEhoXy/LVdaJAYzcS5GezJOcGLw7oS6ycjY52iqkyYm0njWjEM6JDidJwqsyMCYwwAISHCnwe04R9XdOC7jVlcN2kxWUdOOB3Lpy3ZepCVOw4ztm8zwkL99+XUf5MbYzxiVK8mTL4+jS1ZeQx9ZSFbsvKcjuSzJs7NICkugqu7N3Q6ym9iRWCM+ZUL29bl7XG9OHGyiCsnLGTp1oNOR/I56/fk8t3GbEb3SSUqPNTpOL+JFYExpkydGyXy4W1nUzsugpFTlvDpyt1OR/Ipr87NIDYilFG9Up2O8ptZERhjytWoVgzvj+9D50YJ3DHzR16dm4GqDTzbcfAYn67aw7AejUmICXc6zm9mRWCMqVDN2AjeHNOTQZ3q8dgXG3jg47UUBfko5KkLtiLAmHP9Z6rpiti5YcaYSkWFh/LidV1pmBjNq/My2ZNznBeGdSUmIvheQg4eLeDtZT9xeZcG1EuIdjpOtbAjAmOMW0JChPsHtuWRy9vzzYYshk1aTPaRfKdjed3rC7dx4mQx4/v511TTFbEiMMacket7pzJxZHc27jvC0Anfk5EdPKeXHiso5PVF27iobR1a1q3hdJxqY0VgjDljl7RPYebYXhzLLzm9dNm24Di99J1lOzh87KTfTi5XHisCY0yVdG1ckw9u60PNmAhGTFnCrFV7nI7kUSeLipkyfytpTWqSllrL6TjVyorAGFNlTWrH8sGtfejYIIHb3/qByfMyA/b00s9W7WbX4eMBdzQAVgTGmN+oZmwEM27uycCOKfzz8/U8/Om6gDu9VFV5dW4mLevEcUGbOk7HqXYeKwIRmSYiWSKyptRjV4vIWhEpFpE0T23bGONdUeGhvDSsGzef05TpC7dx67+Xc7ygyOlY1ea7Tdls2HuEW/o1JyTEP6earognjwimA/1Pe2wNMBSY58HtGmMcEBIi/O2ydjw4uB1z1u9j2OTF7M8LjNNLJ36XQb2EKIZ0ru90FI/wWBGo6jzg4GmPrVfVjZ7apjHGeTee3ZSJI7uzfk8uQ19ZSKafn17640+HWLL1IGPOaUpEWGC+m+6z/ysRGSci6SKSnp2d7XQcY8wZuLR9CjPH9SIvv5ArJyxk+Xb/Pb104twMEqLDGdajsdNRPMZni0BVJ6lqmqqmJScnOx3HGHOGujWuyQe39iEhOpzhk5fwxWr/O700IzuP2ev2cX3vJgF9tTafLQJjjP9LTYrlg9vOpn39eG576wemLtjqdKQzMmluJhGhIdzQJ9XpKB5lRWCM8ahasRG8NbYXl7ZL4R+frePhT/1j9tJ9uSf48MddXJ3WkKS4SKfjeJQnTx+dCSwCWovIThEZIyK/E5GdQG9gloh85antG2N8R1R4KC+P6MZNZzflte+3cfuMHzhx0rdPL522YCuFxcWMOzfwBpCdzmNveqnqsHK+9aGntmmM8V2hIcIDg9vRoGY0j85ax7DJi5lyfRq1ffCv7ZzjJ5mx5CcGdqxH49oxTsfxOHtryBjjVWPOacorw7uxbncuV05YyLb9R52O9CszlmwnL78wIKeTKEvgfgxujPFZAzrWo058JDe/ns7QCQuZckMa3RrX9Mq2T5wsIvtIPvvz8jmQV8D+vHzXrYDsvHz2H8lnza4czm2ZRIcGCV7J5DTxhwmi0tLSND093ekYxphqtnX/UUa/tpS9OSd4/rqu9O+QcsbrUFXy8gvZn1fAAdeLenZeAfuP5P/vRf7nF/wC8vILy1xPjagwkuMiSYqLpE58JHdd3IrmyXG/9b/oKBFZrqqVTudjRWCMcdSBvHzGvJ7Oyp2HefCydow+uymqyuFjJ//34v3zX+357D/i+vrozy/2+YXFv1qvCNSMiSApLoLasZEk1YgkKS6CpLjIkhf8GiX3a8dFUjs2gqjwUAf+955lRWCM8RvHC4r449s/MnvdPpJrRHLoaAGFZZxiGhoi1IoteQFPiotwvaBH/urFPjkuklqxEYSFBvfHoO4WgX1GYIxxXHREKBNGdufVeRlszT7qekE//cU+ksTo8ICc/dNpVgTGGJ8QGiLcdl4Lp2MEpeA+bjLGGGNFYIwxwc6KwBhjgpwVgTHGBDkrAmOMCXJWBMYYE+SsCIwxJshZERhjTJDziykmRCQb2F7FpycB+6sxjr+z/fEz2xe/ZPvjlwJhfzRR1Uov+u4XRfBbiEi6O3NtBAvbHz+zffFLtj9+KZj2h701ZIwxQc6KwBhjglwwFMEkpwP4GNsfP7N98Uu2P34paPZHwH9GYIwxpmLBcERgjDGmAgFTBCLSX0Q2isgWEflzGd+PFJF3XN9fIiKp3k/pHW7si7tFZJ2IrBKR/4pIEydyektl+6PUcleJiIpIQJ8p4s7+EJFrXD8ja0XkLW9n9BY3flcai8i3IvKj6/dloBM5PU5V/f4GhAIZQDMgAlgJtDttmduAia771wHvOJ3bwX1xPhDjun9roO4Ld/eHa7kawDxgMZDmdG6Hfz5aAj8CNV1f13E6t4P7YhJwq+t+O2Cb07k9cQuUI4IewBZVzVTVAuBt4PLTlrkceN11/z/AhSISiNe8q3RfqOq3qnrM9eVioKGXM3qTOz8bAP8A/gWc8GY4B7izP8YCL6vqIQBVzfJyRm9xZ18oEO+6nwDs9mI+rwmUImgA7Cj19U7XY2Uuo6qFQA5Q2yvpvMudfVHaGOALjyZyVqX7Q0S6Ao1U9TNvBnOIOz8frYBWIvK9iCwWkf5eS+dd7uyLh4CRIrIT+By4wzvRvCtQrllc1l/2p58O5c4ygcDt/6eIjATSgH4eTeSsCveHiIQAzwKjvRXIYe78fIRR8vbQeZQcLc4XkQ6qetjD2bzNnX0xDJiuqk+LSG/gTde+KPZ8PO8JlCOCnUCjUl835NeHcP9bRkTCKDnMO+iVdN7lzr5ARC4C/goMUdV8L2VzQmX7owbQAfhORLYBvYBPAvgDY3d/Vz5W1ZOquhXYSEkxBBp39sUY4F0AVV0ERFEyB1FACZQiWAa0FJGmIhJByYfBn5y2zCfADa77VwHfqOsToABT6b5wvRXyKiUlEKjv/55S4f5Q1RxVTVLVVFVNpeQzkyGqmu5MXI9z53flI0pOKEBEkih5qyjTqym9w5198RNwIYCItKWkCLK9mtILAqIIXO/5/x74ClgPvKuqa0XkEREZ4lpsKlBbRLYAdwPlnkboz9zcF08CccB7IrJCRE7/4Q8Ybu6PoOHm/vgKOCAi64BvgftU9YAziT3HzX1xDzBWRFYCM4HRgfgHpI0sNsaYIBcQRwTGGGOqzorAGGOCnBWBMcYEOSsCY4wJclYExhgT5KwIjDEmyFkRmIAnIrVd4yVWiMheEdlV6uuFHtpmVxGZUsXnvi0igTiS1/goG0dggoqIPATkqepTHt7Oe8CjqrqyCs/tB4xU1bHVn8yYX7MjAhPURCTP9e95IjJXRN4VkU0i8riIjBCRpSKyWkSau5ZLFpH3RWSZ63Z2GeusAXQ6VQIi8pCITBOR70QkU0T+4Ho8VkRmichKEVkjIte6VjEfuMg1J5YxHmc/aMb8rDPQlpLJCDOBKaraQ0T+SMn0w3cCzwPPquoCEWlMyfQEbU9bTxqw5rTH2lAyf08NYKOITAD6A7tVdRCAiCQAqGqxayqUzsDy6v9vGvNLVgTG/GyZqu4BEJEMYLbr8dW4JmEDLgLalbqmUbyI1FDVI6XWU49fT0w2yzXLa76IZAF1Xet9SkSeAD5T1fmlls8C6mNFYLzAisCYn5Wejru41NfF/Py7EgL0VtXjFaznOCWzVJa37iIgTFU3iUh3YCDwmIjMVtVHXMtEudZjjMfZZwTGnJnZlMxYCYCIdCljmfVAi8pWJCL1gWOq+m/gKaBbqW+3Atb+tqjGuMeOCIw5M38AXhaRVZT8/swDxpdeQFU3iEhCGW8Zna4j8KSIFAMngVsBRKQucPzU21TGeJqdPmqMB4jIXcARVT3jsQSu5+aq6tTqT2bMr9lbQ8Z4xgR++bnAmTgMvF6NWYypkB0RGGNMkLMjAmOMCXJWBMYYE+SsCIwxJshZERhjTJCzIjDGmCD3/wGbljGrhVZKJQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "time_range = mol.fstep*np.arange(len(proj))\n",
    "plt.plot(time_range, proj)\n",
    "_ = plt.xlabel('Time (ns)')\n",
    "_ = plt.ylabel('RMSD to crystal')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Projecting a list of MD simulations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the arrival of Markov state models, the paradigm of running one single long simulation has shifted to running hundreds or thousands of shorter simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the analysis of a large amount of MD trajectories in mind, in HTMD, the `simlist` function can be used to bundle all the trajectories together:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Creating simlist: 100%|██████████| 314/314 [00:00<00:00, 2580.37it/s]\n"
     ]
    }
   ],
   "source": [
    "sims = simlist(glob('data/1/filtered/*/'), 'data/1/filtered/filtered.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Use Metric class to project a list of simulations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the `Metric` class, one can calculate all the projections of an entire simlist at once. In this case, the `MetricDistance` projection class is used, which calculates the matrix of distances between two selections (`sel1` and `sel2`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Projecting trajectories: 100%|██████████| 313/313 [00:56<00:00,  5.57it/s]\n",
      "2018-03-15 23:58:31,995 - htmd.projections.metric - INFO - Frame step 9.999999974752427e-07ns was read from the trajectories. If it looks wrong, redefine it by manually setting the MetricData.fstep property.\n"
     ]
    }
   ],
   "source": [
    "metr = Metric(sims)\n",
    "metr.set(MetricDistance(sel1='protein and name CA', sel2='resname MOL and noh', metric='distances'))\n",
    "data = metr.project()\n",
    "data.fstep = 0.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the projection is the matrix of distances between each carbon alpha of the protein and each heavy atom of the MOL molecule (which in the case of these trajectories, it's a ligand binding to the protein)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The `Metric.project` method returns a `MetricData` object which contains the trajectory and projection information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MetricData object with 313 trajectories of 6260.0ns aggregate simulation time\n",
      "Centers: None\n",
      "K: None\n",
      "N: None\n",
      "description: <class 'pandas.core.frame.DataFrame'> at 0x7f424512be80\n",
      "fstep: 0.1\n",
      "parent: <class 'NoneType'> at 0x7f42aa8aba70\n",
      "trajectories shape: (313,)\n"
     ]
    }
   ],
   "source": [
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Working with MetricData objects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `MetricData` object contains the information on each trajectory, including the corresponding projection. For example, for simulation 14:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sim: simid = 14\n",
      "projection: np.array(shape=(200, 2493))\n",
      "reference: np.array(shape=(200, 2))\n",
      "cluster: None\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(data.trajectories[14])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The projection data can be easily accessed as a whole in the `MetricData.dat` property, which is an array of size equal to the number of trajectories. We can check the projection data of simulation 14 with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 25.02148819,  22.95205688,  22.94793129, ...,  32.08512497,\n",
       "         35.54423523,  36.99856567],\n",
       "       [ 26.67520523,  23.7568512 ,  23.29753304, ...,  32.78013992,\n",
       "         36.31970978,  37.90874863],\n",
       "       [ 25.62865257,  23.9732132 ,  23.40108299, ...,  34.10222626,\n",
       "         37.37914276,  38.52599335],\n",
       "       ..., \n",
       "       [ 42.73937607,  43.45667648,  40.32186127, ...,  42.97974014,\n",
       "         40.95644379,  42.0711174 ],\n",
       "       [ 37.95848846,  40.92785263,  39.53430176, ...,  38.68696976,\n",
       "         37.77161789,  39.62480164],\n",
       "       [ 42.03954697,  42.95910263,  42.93307495, ...,  42.20518112,\n",
       "         41.23760986,  42.07869339]], dtype=float32)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dat[14]  # equivalent to data.trajectories[14].projection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(313,)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dat.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(62600, 2493)\n"
     ]
    }
   ],
   "source": [
    "bundledprojs = np.vstack(data.dat) # np.concatenate can be used instead as well\n",
    "print(bundledprojs.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Operations and calculations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can use numpy functions like `np.min`, `np.max`, `np.average` to do calculations over the projected data:\n",
    "* get the global minimum over all data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.92014\n"
     ]
    }
   ],
   "source": [
    "global_minimum = np.min(bundledprojs)\n",
    "print(global_minimum)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* get the maximum distance on each frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(62600,)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 54.39825058,  53.99809647,  55.22867203, ...,  40.54018021,\n",
       "        39.33503342,  37.30210876], dtype=float32)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "max_eachframe = np.max(bundledprojs,axis=1)\n",
    "print(max_eachframe.shape)\n",
    "max_eachframe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* or get the average of each pair across all frames:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2493,)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 34.95511627,  33.69743729,  33.09692001, ...,  32.52411652,\n",
       "        33.0621109 ,  33.91365433], dtype=float32)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg_acrossframes = np.average(bundledprojs,axis=0)\n",
    "print(avg_acrossframes.shape)\n",
    "avg_acrossframes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `axis` property is important to make the operation over the desired dimension:\n",
    "\n",
    "* `axis=1`, in this case, operates over all pairs, to give a result for each frame\n",
    "* `axis=0`, in this case, operates over all frames, to give a result for each pair\n",
    "\n",
    "But how can we know to what particular atoms of the system corresponds a given indexed pair?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Playing with the projection mapping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The mapping (meaning of each index) of the projection can be obtained from the `MetricData.map` property:\n",
    "\n",
    "* the mapping is a pandas `DataFrame` object\n",
    "\n",
    "The `head` method of `DataFrame` can be used to print the top entries of the mapping:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atomIndexes</th>\n",
       "      <th>description</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[4, 4480]</td>\n",
       "      <td>distance between GLU 1 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[19, 4480]</td>\n",
       "      <td>distance between ALA 2 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[29, 4480]</td>\n",
       "      <td>distance between ASP 3 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[41, 4480]</td>\n",
       "      <td>distance between CYX 4 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[51, 4480]</td>\n",
       "      <td>distance between GLY 5 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  atomIndexes                               description      type\n",
       "0   [4, 4480]  distance between GLU 1 CA and MOL 278 C3  distance\n",
       "1  [19, 4480]  distance between ALA 2 CA and MOL 278 C3  distance\n",
       "2  [29, 4480]  distance between ASP 3 CA and MOL 278 C3  distance\n",
       "3  [41, 4480]  distance between CYX 4 CA and MOL 278 C3  distance\n",
       "4  [51, 4480]  distance between GLY 5 CA and MOL 278 C3  distance"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.map.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "One can assess particular elements of the mapping through:\n",
    "\n",
    "* `iloc` - i(nteger)loc(ation):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atomIndexes</th>\n",
       "      <th>description</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>[321, 4480]</td>\n",
       "      <td>distance between ARG 21 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>[345, 4480]</td>\n",
       "      <td>distance between GLU 22 CA and MOL 278 C3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    atomIndexes                                description      type\n",
       "20  [321, 4480]  distance between ARG 21 CA and MOL 278 C3  distance\n",
       "21  [345, 4480]  distance between GLU 22 CA and MOL 278 C3  distance"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.map.iloc[20:22]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * `loc` (using the DataFrame index):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "atomIndexes                                  [321, 4480]\n",
       "description    distance between ARG 21 CA and MOL 278 C3\n",
       "type                                            distance\n",
       "Name: 20, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.map.loc[20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* or direct assess to the array elements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'distance between ARG 21 CA and MOL 278 C3'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.map['description'][20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "One can use the mapping to find out to what particular pair of elements the minimum distance found in the trajectories corresponds to."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `np.where` functionality gives us the array coordinates where a given condition is matched (in this case, when the distance is equal to the minimum):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.92014\n",
      "[2196]\n",
      "[ 2.9201436]\n"
     ]
    }
   ],
   "source": [
    "print(global_minimum)\n",
    "frame, index = np.where(bundledprojs == global_minimum)\n",
    "print(index)\n",
    "print(bundledprojs[frame, index]) # confirm that indeed these coordinates correspond the inquired value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, using the index in the mapping shown before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atomIndexes</th>\n",
       "      <th>description</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2196</th>\n",
       "      <td>[4113, 4497]</td>\n",
       "      <td>distance between GLY 258 CA and MOL 278 N3</td>\n",
       "      <td>distance</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       atomIndexes                                 description      type\n",
       "2196  [4113, 4497]  distance between GLY 258 CA and MOL 278 N3  distance"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.map.iloc[index]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "gives us the atom indexes and description of the pair."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
